Linear Thompson Sampling Revisited A Examples of TS distributions
نویسندگان
چکیده
A Examples of TS distributions Example 1: Uniform distribution ⌘ ⇠ UBd(0,d). The uniform distribution satisfies the concentration property with constants c = 1 and c0 = e d by definition. Since the set {⌘|uT⌘ 1}\Bd(0, p d) is an hyper-spherical cap for any direction u of Rd, the the anti-concentration property is satisfied provided that the ratio between the volume of an hyper-spherical cap of height p d 1 and the volume of the ball of radius pd is constant (i.e., independent from d). Using standard geometric results (see Prop. 9), one has that for any vector kuk = 1 P(u⌘ 1) = 1
منابع مشابه
Linear Thompson Sampling Revisited
We derive an alternative proof for the regret of Thompson sampling (TS) in the stochastic linear bandit setting. While we obtain a regret bound of order e O(d3/2 p T ) as in previous results, the proof sheds new light on the functioning of the TS. We leverage on the structure of the problem to show how the regret is related to the sensitivity (i.e., the gradient) of the objective function and h...
متن کاملOptimality of Thompson Sampling for Gaussian Bandits Depends on Priors
In stochastic bandit problems, a Bayesian policy called Thompson sampling (TS) has recently attracted much attention for its excellent empirical performance. However, the theoretical analysis of this policy is difficult and its asymptotic optimality is only proved for one-parameter models. In this paper we discuss the optimality of TS for the model of normal distributions with unknown means and...
متن کاملMATHEMATICAL ENGINEERING TECHNICAL REPORTS Optimality of Thompson Sampling for Gaussian Bandits Depends on Priors
In stochastic bandit problems, a Bayesian policy called Thompson sampling (TS) has recently attracted much attention for its excellent empirical performance. However, the theoretical analysis of this policy is difficult and its asymptotic optimality is only proved for one-parameter models. In this paper we discuss the optimality of TS for the model of normal distributions with unknown means and...
متن کاملStochastic Regret Minimization via Thompson Sampling
The Thompson Sampling (TS) policy is a widely implemented algorithm for the stochastic multiarmed bandit (MAB) problem. Given a prior distribution over possible parameter settings of the underlying reward distributions of the arms, at each time instant, the policy plays an arm with probability equal to the probability that this arm has largest mean reward conditioned on the current posterior di...
متن کاملThompson Sampling for Linear-Quadratic Control Problems
We consider the exploration-exploitation tradeoff in linear quadratic (LQ) control problems, where the state dynamics is linear and the cost function is quadratic in states and controls. We analyze the regret of Thompson sampling (TS) (a.k.a. posterior-sampling for reinforcement learning) in the frequentist setting, i.e., when the parameters characterizing the LQ dynamics are fixed. Despite the...
متن کامل